DOCUMENT RESUME 



ED 417 200 



TM 028 141 



AUTHOR 

TITLE 

INSTITUTION 

SPONS AGENCY 

REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Johnson, Eugene G. ; Lazer, Stephen; O'Sullivan, Christine Y. 
NAEP Reconfigured: An Integrated Redesign of the National 
Assessment of Educational Progress. Working Paper Series. 
Educational Testing Service, Princeton, NJ.; Westat, Inc., 
Rockville, MD. 

National Center for Education Statistics (ED) , Washington, 
DC. 

NCES-WP-97-31 
1997-08-00 
248p . 

U.S. Department of Education, Office of Educational Research 
and Improvement, National Center for Education Statistics, 
555 New Jersey Avenue, N.W. , Room 400, Washington, DC 
20208-5654 . 

Reports - Evaluative (142) 

MF01/PC10 Plus Postage. 

♦Data Collection; Elementary Secondary Education; 

♦Integrated Activities; *Measurement Techniques; National 
Surveys; *Research Design; Research Reports; *Scoring 
♦National Assessment of Educational Progress 



ABSTRACT 



Chapters in this report outline the potential plans for the 
redesign of the National Assessment of Educational Progress (NAEP) . It is 
argued that any successful redesign must consider the NAEP as a whole. This 
report reviews overall NAEP designs and discusses the implications that each 
of the designs has for various functional areas. The following chapters are 
included: (1) "Introduction" (Eugene G. Johnson and Stephen Lazer) ; (2) "An 

Integrated Approach to the Redesign of NAEP" (Eugene G. Johnson) ; (3) 

"Potential Designs for NAEP" (Eugene G. Johnson) ; (4) "Measuring Cognitive 

Skills" (Stephen Lazer, Robert J. Mislevy, Kim R. Whittington, and William 
Ward); (5) "Measuring Contextual Information" (Gita Z. Wilder); (6) 

"Sampling" (Keith Foster Rust and Juliet Popper Shaffer) ; (7) "Data 

Collection" (Nancy W. Caldwell); (8) "Scoring" (Christine Y. O'Sullivan); (9) 
"Analysis" (Eugene G. Johnson and James E. Carlson); and (10) "Reporting" 
(Stephen Lazer and Eugene G. Johnson) . An appendix contains the policy 
statement on redesigning the NAEP from the National Assessment Governing 
Board and "An Operational Vision for NAEP- -Year 2000 and Beyond" from the 
National Center for Education Statistics. (SLD) 



++++++++++++++++++++++++++++++****★***+*************++*********************** 
* Reproductions supplied by EDRS are the best that can be made 

from the original document. 



+ 



/ *7 / § /oLZ. 



<N 

r- 

r-H 

a 



NATIONAL CENTER FOR EDUCATION STATISTICS 

Working Paper Series ; Js; 






A^AEP RECONFIGURED: 

An Integrated Redesign of the 
National Assessment of Educational Progress 



Working Paper No. 97-31 



October 1997 

















A ■■■:■[■ 

aa. 






•* ■ ; • ■ 




;wV.' 


ISfe 

i 4 .. "C 


■m 

■ . - 1 v v - 


iku ■ 










m q department of education 

OIUcfol EducSonal Research and Impmvemen. 
EDUCATIONAL RESOURCES INFORMATION 
JS CENTER (ERIC) 
oxfhis document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERl position or policy. 



U.S. Department of Education . . 

Office of EducatfoiiaF Research and Improvement 



O 

ERIC 



Js 



BEST COP ¥ AVAILABLE 



NAEP RECONFIGURED: 



An Integrated Redesign of the 
National Assessment of Educational Progress 



Working Paper No. 97-31 October 1997 



Contact: Steven Gorman 

Assessment Group 
(202) 219-1937 

e-mail: steven_gorman@ed.gov 



U.S. Department of Education 

Richard W. Riley 

Secretary 

Office of Educational Research and Improvement 

Ricky T. Takai 
Acting Assistant Secretary 

National Center for Education Statistics 
Pascal D. Forgione, Jr. 

Commissioner 

Assessment Group 
Gary W. Phillips 
Associate Commissioner 



The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, 
and reporting data related to education in the United States and other nations. It fulfills a congressional 
mandate to collect, collate, analyze, and report full and complete statistics on the condition of education 
in the United States; conduct and publish reports and specialized analyses of the meaning and significance 
of such statistics; assist state and local education agencies in improving their statistical systems; and review 
and report on education activities in foreign countries. 

NCES activities are designed to address high priority education data needs; provide consistent, reliable, 
complete, and accurate indicators of education status and trends; and report timely, useful, and high quality 
data to the U.S. Department of Education, the Congress, the states, other education policymakers, 
practitioners, data users, and the general public. 

We strive to make our products available in a variety of formats and in language that is appropriate to a 
variety of audiences. You, as our customer, are the best judge of our success in communicating 
information effectively. If you have any comments or suggestions about this or any other NCES product 
or report, we would like to hear from you. Please direct your comments to: 

National Center for Education Statistics 
Office of Educational Research and Improvement 
U.S. Department of Education 
555 New Jersey Avenue, NW 
Washington, DC 20208 



Suggested Citation 

U.S. Department of Education. National Center for Education Statistics. NAEP Reconfigured: An Integrated Redesign of the 
National Assessment of Educational Progress, Working Paper No. 97-31, by Eugene G. Johnson, Stephen Lazer, and 
Christine Y. O’Sullivan. Project Officer, Steven Gorman. Washington, D.C.: 1997. 



October 1997 



Foreword 



Each year a large number of written documents are generated by NCES staff and 
individuals commissioned by NCES which provide preliminary analyses of survey results and 
address technical, methodological, and evaluation issues. Even though they are not formally 
published, these documents reflect a tremendous amount of unique expertise, knowledge, and 
experience. 

The Working Paper Series was created in order to preserve the information contained 
in these documents and to promote the sharing of valuable work experience and knowledge. 
However, these documents were prepared under different formats and did not undergo 
vigorous NCES publication review and editing prior to their inclusion in the series. 
Consequently, we encourage users of the series to consult the individual authors for citations. 

To receive information about submitting manuscripts or obtaining copies of the series, 
please contact Ruth R. Harris at (202) 219-1831 or U.S. Department of Education, Office of 
Educational Research and Improvement, National Center for Education Statistics, 555 New 
Jersey Ave., N.W., Room 400, Washington, D.C. 20208-5654. 



Samuel S. Peng 
Acting Director 

Statistical Standards and Services Group 



NATIONAL CENTER FOR EDUCATION STATISTICS 



NAEP Reconfigured: 

An Integrated Redesign of the 
National Assessment of Educational Progress 



Eugene G. Johnson 
Stephen Lazer 
Christine Y. O’Sullivan 



In collaboration with: 



Nancy Allen 
John Barone 
Johnny Blair 
Henry I. Braun 
John Burke 
Nancy W. Caldwell 
James £. Carlson 
John Donoghue 
Elizabeth Durkin 
Carol Errickson 
John J. Ferris 
John Fremer 
David S. Freund 



Lauren G. Fried 
James Green 
Elissa Greenwald 
Lynn Jenkins 
Graham Kalton 
Debra L. Kline 
Barbara M. Klish 
Robert Linn 
Rosemary A. Loeb 
John Mazzeo 
Robert J. Mislevy 
Dori Nielson 
Norma Norris 



Paul Ramsey 
Linda L. Reynolds 
Keith Foster Rust 
Terry L. Schoeps 
Juliet Popper Shaffer 
Gerald Shelton 
Brent Studer 
Bradley J. Thayer 
David Thissen 
William C. Ward 
Kim R. Whittington 
Gita Z. Wilder 
Paul L. Williams 



August 1997 



Prepared by Educational Testing Service and Westat 
under a cooperative agreement with the 
National Center for Education Statistics 

U.S. Department of Education 
Office of Educational Research and Improvement 



- Table of Contents - 



Chapter 1: Introduction i_i 

by Eugene G. Johnson and Stephen Lazer 

The NAGB/NCES Redesign Initiative 1-4 

The Unifying Themes for this Report 1-6 

Chapter 2: An Integrated Approach to the Redesign of NAEP 2-1 

by Eugene G. Johnson 

Making Choices Among Conflicting Goals 2-1 

Integration Rather Than Local Optimization 2-2 

About the Following Chapters 2-9 

Chapter 3 : Potential Designs for NAEP 3-1 

by Eugene G. Johnson 

The Current NAEP 

A Streamlined NAEP 

A Modular NAEP 3.7 

A Parallel-Forms NAEP 

Parallel Forms As a Module 3-13 

Parallel Forms As the Core 3-13 

Other Issues 3^5 

Recommendations for the Overall Design 3-16 

Chapter 4 : Measuring Cognitive Skills 4-1 

by Stephen Lazer, Robert /. Mislevy, Kim R. Whittington, and William Ward 

Introduction 

Measuring Cognitive Skills in the Current NAEP 4-2 

The "Appropriate Mix" of Multiple-Choice and Constructed-Response Items 4-6 

Advantages and Disadvantages of Multiple-Choice Items 4-8 

Advantages and Disadvantages of Constructed-Response Items 4-8 

Performance Tasks, Content Validity', and NAEP — An Evidentiary Perspective 4-10 

The Mix of Item Types, Modularity, Cost, and Schedule 4-28 

Using New Technologies in Testing 4-30 

Computerized Adaptive Testing Designed to Produce NAEP Scale Scores 4-30 

Computer-Based Testing Designed to Assess Skills Not Amenable to Pencil-and-Paper 

Testing or to Introduce Efficiencies 4.37 

Designing a CBT Delivery System for NAEP 4-39 

Recommendations on CBT and NAEP 4.4 1 

Cognitive Instrumentation: Areas in Which Current Practices Affect the System 4-42 

Limitation of Student Testing Time 4_42 

Use of BIB Spiraling 4.44 

Uses of Field Testing 4.44 

Local Optimizations That Will Profit Any Model of NAEP 4-47 

Cognitive Testing Under Different NAEP Models 4-48 

Cognitive Testing in a Streamlined NAEP 4-48 

Cognitive Testing in a Modular NAEP 4.49 

Cognitive Testing in a Parallel-Forms NAEP 4.49 

Recommendations for Cognitive Testing 4.51 



Chapter 5: Measuring Contextual Information 5-1 

by Gita Z. Wilder 

Introduction 

Measurement of Contextual Information in NAEP Today 5-2 

Issues in Measuring Contextual Information 5-3 

Validity And Quality 5.5 

Burden 5_12 

Use of Data from Other Sources 5_13 

Interactions Between Measuring Contextual Information and Other Program Areas 5-15 

Measuring Contextual Information in Different NAEP Models 5-15 

Measuring Contextual Information in a Streamlined NAEP 5-15 

Measuring Contextual Information in a Modular NAEP 5-16 

Measuring Contextual Information in a Parallel-Forms NAEP 5-17 

Recommendations for Measuring Contextual Information 5-17 



Chapter 6: Sampling 

by Keith Foster Rust and Juliet Popper Shaffer 

Introduction 

The Sample Design of the Current NAEP 

Main NAEP Versus Long-Term Trend 

Issues in NAEP Sampling 

The Combination of State and National Main Assessment Samples 

The Targeted Assessment of Specific Groups of Students, Particidarly Those 

in Certain Proficiency Categories 

The Oversampling of Particular Population Subgroups 

The Use of Panels of Schools to Enhance the Reliability of Trend Reporting 

The Use of Auxiliary Information About Schools and Students 

to Improve the Efficiency of Samples 

The Broadening of the Scope of the Assessment to Include Age-Appropriate Students 

in Ungraded Settings 

The Investigation of Adjustments to Sample-Weighting Procedures 

Interactions Between Sampling and the Other Program Areas 

Recommendations for Sampling 



.... 6-1 

...6-1 

...6-1 

...6-2 

...6-4 

...6-5 

.6-12 

.6-13 

6-14 

6-18 

6-22 

6-23 

6-24 

6-25 



Chapter 7: Data Collection 

by Nancy W. Caldwell 

Introduction 

Data Collection in the Current NAEP 

Issues in Data Collection 

Reducing Costs 

Minimizing Burden 

Interactions Between Data Collection and Other Program Areas 

Sampling Procedures and Data Collection 

Measuring Contextual Information and Data Collection 

Measuring Cognitive Skills and Data Collection 

Reporting and Data Collection 

Data Collection Under Different NAEP Designs 

Data Collection in a Streamlined NAEP 

Data Collection in a Modular NAEP 

Data Collection in a Parallel-Forms NAEP 

Recommendations for Data Collection 



...7-1 

...7-1 

...7-1 

...7-2 

..7-2 

..7-3 

..7-3 

..7-3 

..7-5 

..7-7 

..7-8 

..7-8 

..7-8 

..7-9 

..7-9 

7-10 



Chapter 8: Scoring 8-1 

by Christine Y. O'Sullivan 

Introduction 8-1 

Computer-Based (Image) Scoring 8-2 

Costs 8-3 

Benefits 8-3 

Remote-Site Electronic Scoring 8-4 

Costs 8-5 

Benefits 8-6 

Image Scoring Under the Alternate NAEP Designs 8-6 

Automated Scoring 8-6 

The Rote of Automated Scoring in NAEP 8-9 

Costs 8-10 

Benefits 8-10 

Interactions of Automated Scoring with Other Program Areas in the Current NAEP 8-11 

Automated Scoring in a Streamlined NAEP 8- 1 1 

Automated Scoring in a Modular NAEP 8-1 1 

Rater Reliability 8-12 

The Current NAEP System 8-13 

Costs 8-14 

Benefits 8-14 

Impact on Alternative Designs 8-14 

Recommendations for Scoring 8-15 

Chapter 9: Analysis 

by Eugene G. Johnson and James E. Carlson 

Introduction 

Analysis Procedures in the Current NAEP 9-2 

Issues Related to Analysis 9.5 

Lengthening Testing Time 9-5 

Precalibration of Items 9-6 

Two-Phase Analysis 9-12 

Market-Basket Based Analysis 9-13 

Rule-Space Analysis 9-14 

Nezv Item Response Theory Applications and Other Model-Based Procedures 9-14 

Nonresponse Adjustments 9-17 

Techniques Which May or May Not Lead to Efficiencies or Cost Savings 9-17 

Eliminate IRT Scaling 9-18 

Eliminate Plausible Values 9-23 

Interactions Between Analysis Procedures and the Other Program Areas 9-24 

Recommendations for Analysis 9-25 

Chapter 10: Reporting 10-1 

by Stephen Lazer and Eugene G. Johnson 

Introduction 10-1 

Reporting in the Current NAEP 10-1 

Market-Basket Reporting 10-3 

Ways in Which Current Processes Impact the Release of Reports 10-7 

Whether or Not Current Analysis or Instrumentation Is Appropriate 

to Support New Reporting Goals 10-8 

The Nature and Amount of NAEP Reporting 10-8 

Recommendations for Reporting 10-10 



9 



Appendix 

Policy Statement on Redesigning NAEP 

National Assessment Governing Board 
An Operational Vision for NAEP — Year 2000 and Beyound 
National Center for Education Statistics 



Acknowledgments 



Chapter 1 
Introduction 

Executive Summary 



This chapter discusses the general purpose of this report, which is to outline the 
potential plans for the redesign of the National Assessment of Educational Progress 
(NAEP). We view NAEP as an integrated system, where a change in any one of the 
functional areas — cognitive measurement, contextual questionnaire development, 
sampling, data collection, scoring, analysis, and reporting — will have impact on the 
others. Thus, we argue that any successful redesign effort must consider NAEP as a 
whole. Our report considers overall NAEP designs and discusses the implications that 
each of these designs have for the various functional areas. 



Chapter 1 
Introduction 



CHAPTER 1: INTRODUCTION 



- Eugene G. Johnson / Stephen Lazer - 

For 27 years the National Assessment of Educational Progress (NAEP) has 
served as the nation's primary indicator of what students know and can do. Based on 
state-of-the-art measurement techniques, integrated use of cognitive and background 
questions, and representative national samples, NAEP has served as the country's best 
provider of reliable, objective information on student performances and on trends in 
academic achievement. NAEP data and reports are currently used in a variety of arenas 
and have informed the various debates about educational reform in the United States. 

Over the three decades of its existence, the National Assessment has become one 
of the most innovative and successful surveys regularly conducted in the United States. 
NAEP has been asked to meet a wide variety of goals and priorities, and these have 
imposed constraints and demands faced by no other educational assessment program. 
The National Assessment has been called on to measure student knowledge of broad 
content domains and to gather in-depth contextual information, at the same time 
minimizing the burden faced by individual participants. NAEP has pioneered the use 
of performance assessment methodologies in large-scale settings, and NAEP staff have 
determined ways to use computerized image-processing technologies to score 
performance exercises in a cost-effective and statistically reliable manner. 
Psychometricians working on NAEP have developed procedures that allow for the 
combination of multiple-choice and performance measures into integrated scales. 
National Assessment analysts, programmers, and authors have developed artificial 
intelligence systems that generate computer-written natural-language reports for states 
that participate in NAEP. Overall, NAEP has become a gold standard: a model of 
innovation and accuracy. 

However, it is perhaps NAEP's very successes that have created some of the 
strains that have led to the current redesign initiative. Because NAEP has shown a 



O 

ERIC 



i-i 



12 



NAEP Reconfigured 



CHAPTER 1: INTRODUCTION 



consistent ability to satisfy program goals, new priorities have arisen — priorities that 
have often been in conflict with other program imperatives. In the late 1980s and early 
1990s, NAEP was simultaneously asked to increase its use of performance assessment 
exercises and to test larger numbers of students as parts of state samples. New 
definitions of assessment content defined in National Assessment Governing Board 
(NAGB) Frameworks necessitated assessments involving both multiple-choice and 
constructed-response questions and the combination of these item-types in core 
reporting scales. NAEP was called on to measure trends and to reflect the best and most 
up-to-date curricular practices. NAEP was asked to provide timely information for 
policymakers and to allow in-depth analyses by education researchers in various 
subject disciplines. The publication of America 2000: An Education Strategy and the 
related work of the National Education Goals Panel increased the relevance of NAEP 
data and led to demands for more timely and frequent reporting; these demands came 
precisely at the time that the National Assessment was becoming more complex and 
expensive to administer. These developments and imperatives tended to interact and 
intensify: NAEP s increasing visibility and proven record of success led policymakers 
and educators to view the National Assessment as a vehicle of curricular reform and to 
demand even greater innovation in instrument design. 

Overall, NAEP was called on to do more in an era of level funding. 
Concomitantly, the program's new priorities in no way excused it from its historical 
imperatives of providing the American public with statistical data of the highest 
quality, minimizing individual respondent burden, protecting participant 
confidentiality, responding rapidly to changes in policy, and allowing Department of 
Education policymakers maximum flexibility in their decision-making processes. 

Despite the many and varied challenges that NAEP has faced, the program has 
continued to meet the majority of its goals. NAEP instruments, sampling designs, 
administration procedures, and psychometric methodologies have become models of 
innovation, yet have remained operationally and analytically feasible. NAEP reports 
have served the needs of a wide variety of audiences. The program's expansion to the 



NAEP Reconfigured 



1-2 



CHAPTER 1: INTRODUCTION 



state level has made NAEP the benchmark against which the success of educational 
reform efforts are measured and new programs are planned. NAEP's management and 
implementation have fostered a flexibility that has allowed time that was once needed 
for test development to be spent, instead, on building consensus about what is to be 
measured and how it is to be measured. In addition, this flexibility has enabled 
assessments to evolve within the context of maintaining trend data. And NAEP's 
matrix-sampled design has allowed content, rather than testing methodology, to be the 
driving factor in the construction of NAEP instruments. 

However, these successes have come at a price, where trade-offs have had to be 
made. With flexibility in schedule and instrument design has come analytic complexity. 
With performance testing have come further complications of analysis, difficulty in 
trend determination and, especially in the state program, significant expense. 
Evolutionary changes in assessments have required special bridging studies whose 
analyses must fall on the critical work path. New assessment Frameworks have 
invariably posed new developmental and psychometric challenges. Together, these 
changes in the assessment have prevented NAEP from realizing the efficiencies 
associated with the operational consistencies of most testing programs. All these factors 
have tended to add both complexity and cost to NAEP. With complexity has come 
lengthy reporting schedules. With expense has come limits on the number of 
assessments that can be administered. 

The realization that trade-offs are inherent in the design and conduct of the 
NAEP program has led many associated with NAEP to begin asking fundamental 
questions about the program's future directions. For example, do assessments that 
necessitate extensive use of performance testing, while feasible if administered to small 
national samples, prove prohibitively expensive in a state-level program? Can a 
National Assessment that is expensive to administer and score serve the state linking 
function which many now envision for it? Is it possible that re-crafting the current 
integrated NAEP structure in favor of a modular design — in which certain inexpensive 
instruments represent an assessment core while other, more innovative modules could 



1-3 



X 4 NAEP Reconfigured 



CHAPTER 1: INTRODUCTION 



be given as needed and analyzed off the critical reporting path — might better serve the 
program's new missions? In general, can one instrument satisfy all the publics who 
may wish to use its results? These and other questions began to suggest to many that if 
the basic purposes and structures of NAEP were not reexamined, the program might 
not be able to fully meet all of its conflicting imperatives. 

The NAGB/NCES Redesign Initiative 

Realizing that NAEP faced questions about its basic mission and design, NAGB 
began a careful and thorough examination of the nature and purposes of the National 
Assessment and the ways it could be redesigned to better meet its goals. This process 
involved input from Board members and staff, an independent Design/Feasibility 
Team 1 made up of eminent psychometricians, and hundreds of concerned citizens. The 
result of this initiative was the NAGB adoption, on August 2, 1996, of the Policy 
Statement on Redesigning The National Assessment of Educational Pvogress. This statement 
argues that NAEP should have three core objectives that would serve as the means for 
accomplishing its legislatively-mandated purpose of providing a fair and accurate 
presentation of educational achievement. These objectives are: 

(1) to measure national and state progress toward the third National Education 
Goal 2 and provide timely, fair, and accurate data about student achievement 
at the national level, among the states, and in comparison with other nations 

(2) to develop, through a national consensus, sound assessments to measure 
what students know and can do, as well as what students should know and 
be able to do 



* resuhs of this team's work, published as Design/Feasibility Team: Report to the National Assessment Governing Board , had an 
important influence on the NAGB Redesign Statement and have also played a major role in organizing the work in this redesign 
planning effort t 

2 The third National Education Goal, called "Student Achievement and Citizenship/' states that: "By the year 2000 all students will 
leave grades 4, 8, and 12 having demonstrated competency over challenging subject matter including English, mathematics, science, 
foreign languages, civics and government, economics, arts, history, and geography, and every school in America will ensure that all 
students learn to use their minds well, so they may be prepared for responsible citizenship, further learning, and productive 
employment in our nation's modem economy" The National Education Goals Report: Building a Nation of Learners. National Education 
Goals Panel. (1996). Washington, DC 



NAEP Reconfigured 



1-4 



15 



